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(54) Universal remote control allowing natural language modality for television and multimedia 
searches and requests 



(57) The remote control unit supports multi-modal 
dialog with the user, through which the user can easily 
select programs for viewing or recording. The remote 
control houses a microphone into which the user can 
input natural language speech. The input speech is rec- 
ognized and interpreted by a natural language parser 
that extracts the semantic content of the user's speech. 
The parser works in conjunctbn with an electronic pro- 
gram guide, through which the remote control system is 
able to ascertain what programs are available for view- 
ing or recording and supply appropriate prompts to the 
user. In one embodiment, the remote control includes a 



touch screen display upon which the user may view 
prompts or make selections by pen input or tapping. Se- 
lections made on the touch screen automatically limit 
the context of the ongoing dialog between user and re- 
mote control, allowing the user to interact naturally with 
the unit. The remote control unit can control virtually any 
audio-video component, including those designed be- 
fore the current technology. The remote control system 
can be packaged entirely within the remote control 
handheld unit, or components may be distributed in oth- 
er systems attached to the user's multimedia equip- 
ment. 
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Description 

Background and Summary of the Invention 

[0X501] The ubiquitous remote control, often a multi- s 
tude of them, has found its way onto virtually every cof- 
fee table in the television viewing rooms throughout the 
world. Few television viewers have not experienced the 
frustration of trying to perform even a simple command, 
such as turning on the television and watching a pre- io 
recorded movie, only to be thwarted because he or she 
cannot figure out which button or buttons to press on 
which remote control units. 

[0002] In an attempt to address the proliferation of 
multiple remote controls, many companies offer a uni- is 
versal remote control that is able to operate a variety of 
different audio-video components. These remote con- 
trols, of necessity, feature a panoply of buttons, many 
of them having dual functions, in order to control the 
principal functions of all devices in the user's multimedia 20 
setup. 

[0003] While the conventional universal remote con- 
trol may eliminate the need for having multiple remote 
control units on the coffee table, it does little to simplify 
the user's interaction with his or her audio-video or mul- 25 
timedia system. On the contrary most universal remote 
control units are so complex that they actually impede 
the user's ability to control the equipment. 
[0004] The present invention tackles this problem 
through speech technology recognition and sophisticat- 30 
ed natural language parsing components, that allows 
the user to simply speak into the remote control unit and 
have his or her commands carried out. While the spoken 
commands can be simple commands such as "Play 
VCR" or "Record Channel 6", the natural language pars- 35 
er offers far more complex commands than this. For ex- 
ample, the user could speak: "Show me a funny movie 
starring Marilyn Monroe." Using the speech recognition 
and parser components, the system will search through 
an electronic program guide or movie database and can 40 
respond to the user (for instance) that "Some Like It Hot" 
will be playing next Friday. The user could then, for ex- 
ample, instruct the system to record that movie when it 
comes on. 

[0005] Recording commands need not be limited to 
the entire movie or program. Rather, the user could en- 
ter a command such as: "Record the last five minutes 
of tonight's Toronto-Los Angeles baseball game." Again, 
the speech recognition and parser components convert 
this complex command into a sequence of actions that so 
cause the recording device in the user's system to make 
the requested recording at the appropriate time. 
[0006] The remote control of the invention can be con- 
structed as a self-contained unit having all of the parser 
and speech recognition components on board, or it may 55 
be manufactured in multiple components : allowing 
some of the more complex computational operations to 
be performed by a processor located in a television set, 



set top box, or auxiliary multimedia control unit. In the 
latter case, the hand-held remote and the remote com- 
mand unit communicate with each other by wireless 
transmission. Preferably, the hand-held remote control 
unit includes an infrared port through which the remote 
control can interact with older equipment in the user's 
multimedia setup. Thus the remote control of the inven- 
tion even allows sophisticated natural language speech 
commands to be given to those older audio-video com- 
ponents. 

[0007] For a more complete understanding of the in- 
vention, its objects and advantages, refer to the follow- 
ing specification and to the accompanying drawings. 

Brief Description of the Drawings 

[0008] 

Figure 1 is a plan view of an embodiment of the re- 
mote control in accordance with the invention; 
Figure 2 is a block diagram illustrating the compo- 
nents of the presently preferred embodiment; 
Figure 3 is a block diagram depicting the compo- 
nents of the natural language parser of the present- 
ly preferred embodiment of the invention; and 
Figure 4 is a block diagram depicting the compo- 
nents of the local parser of the presently preferred 
embodiment of the invention. 

Description of the Preferred Embodiment 

[0009] The remote control of the invention can take 
many forms. An exemplary embodiment is illustrated in 
Figure 1, where the remote control is shown at 10 and 
an exemplary television set is shown at 12, In the pre- 
ferred embodiment the remote control 1 0 and television 
12 communicate wirelessly with one another through 
suitable radio frequency link or infrared link. 
[0010] The remote control is designed to operate not 
only more modem aigitai interactive Television and hard 
disk recorder equipment, but also older models of tele- 
visions, VCRs, DVD and laser disk players, surround 
sound processors, tuners, and the like. Accordingly, the 
remote control includes a light-emitting diode transmit- 
ter 14 with which the unit may communicate with all pop- 
ular home entertainment and multimedia components. 
This same transmitter can serve as the communication 
link between the remote control and the television (to 
implement some of the features described herein). 
[0011] In an alternate embodiment, the remote control 
10 and television 12 communicate through a bi-direc- 
tional data communication link that allows the speech 
recognition and natural language parsing components 
to be distributed among the remote control, television 
and optionally other components within the multimedia 
system. 

[0012] Although not required to implement the 
speech-enabled dialog system, the presently preferred 
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remote control 10 also includes a lighted display 1 6 that 
may supply prompts to the user as well as information 
extracted from the electronic program guide. The screen 
may be touch sensitive or tap sensitive, allowing the us- 
er to select menu options and provide handwritten input 
through the stylus 18. Users who regularly employ pen- 
based personal digital assistant (PDA) devices will find 
the stylus input modality particularly useful. 
[001 3] The remote control 10 also includes a comple- 
ment of pushbuttons 20, for performing numeric channel 
selection and other commonly performed operations, 
such as increasing and decreasing the audio volume. A 
jog shuttle wheel 22 may also be included, to allow the 
user to use this feature in conjunction with recorders and 
disk players. 

[001 4] By virtue ol the bi-directional link between re- 
mote control 10 and television 12, the system is capable 
of displaying on-screen prompts and program guide in- 
formation on both the television monitor screen, as illus- 
trated at 24, and on the display screen 16 of the remote 
control. If desired, the on-screen display 24 can be sup- 
pressed, so that the user may make menu item selec- 
tions and electronic program guide selections using the 
remote control screen, without the need to display the 
same information on the television while watching a pro- 
gram. 

[0015] A particularly useful aspect of remote control 
10 is its natural language speech modality. The remote 
control is provided with a microphone as at 26. The user 
speaks in natural language sentences, and these spo- 
ken utterances are picked up by microphone 26 and 
supplied to a sophisticated speech understanding sys- 
tem. The speech understanding system allows the user 
to give the television set and other associated equip- 
ment (such as hard disk recorders or VCR recorders) 
search and record commands in interactive, natural lan- 
guage. 

[0016] As an example of a spoken search command, 
the user could say into the microphone, "Show me a fun- 
ny movie starring Marilyn Monroe." Using its speech rec- 
ognition and parser components, this system searches 
through an electronic program guide or movie database 
and responds to the user whether any options meet the 
user's request. The system might respond, for instance, 
that 'Some Like It Hot" will be playing next Friday. 
[0017] Armed with this information, the user may elect 
to record the movie, by simply speaking, "Please record 
Some Like It Hot." 

[0018] Recording instructions can be quite explicit, 
thanks to the sophisticated natural language system of 
the invention. Thus : the user could enter a complex 
record command such as, "Record the last five minutes 
of tonight's Toronto-LosAngeles baseball game. 0 Again, 
the speech recognition and parser components convert 
this complex command into a sequence of actions that 
the recorder within the system will carry out. 
[0019] Referring to Figure 2, the major functional 
components of the remote control system will now be 



described. In this regard, it is important to understand 
that the components of the remote control system can 
be packaged entirely within the remote control device 
itself* or one or more of these components can be dis- 
s tributed or implemented in other components within the 
system. The more processor-intensive functions of the 
system may be performed, for example, by processors 
located in larger, more powerful components such as 
set top boxes, interactive digital television sets, multi- 
io media recording systems, and the like. 

[0020] For example, the microphone and basic com- 
ponents of the speech recognizer may be housed in the 
remote control unit, with the remaining components 
housed in another piece of equipment. If desired, the 
is speech recognizer itself can be subdivided into compo- 
nents, some of which are housed in the remote control 
and others of which are housed elsewhere. By way of 
example, the component housed in the remote control 
may process the input speech by extracting speech fea- 
20 tures upon which the speech models are trained. The 
remote control then transmits these extracted features 
to the component located elsewhere for further speech 
recognition processing. Alternatively, the input speech 
may simply be transmitted by the remote control in the 
25 audio domain to a speech recognition component locat- 
ed elsewhere. These are of course only a few possible 
examples of how the functionality of the invention may 
be depbyed in distributed fashion. 
[0021] Speech input supplied through microphone 26 
30 is first digitized and fed to speech recognizer module 40. 
The output of speech recognizer module 40 is supplied 
to the natural language parser 42. This parser works in 
conjunction with a set of grammars 44 that allow the sys- 
tem to interpret the meaning behind the user's spoken 
35 instructions. In the presently preferred embodiment 
these grammars are goal-oriented grammars compris- 
ing a collection of frame sentences having one or more 
slots that the system will fill in based upon the words 
recognized from the user's input speech. More detail 
40 about the presently preferred parser and these goal-ori- 
ented grammars is presented below. 
[0022] The natural language parser 42 has access to 
a stored semantic representation of the electronic pro- 
gram guide 46. The electronic program guide can be 
45 downloaded from the internet or supplied via the enter- 
tainment system's cable or satellite link. These sources 
of electronic program guide information are illustrated 
generally at 50. Typically, the television tuner 52 may be 
used to obtain this information and furnish it to the se- 
so mantic representation stored at 46. Alternatively, this in- 
formation could be supplied by telephone connection to 
a suitable Internet service provider or dedicated elec- 
tronic program guide service provider. 
[0023] The typical electronic program guide repre- 
ss sents a complex hierarchial structure that breaks down 
different types of program content according to type. 
Thus a program guide may divide programs into differ- 
ent categories, such as movies, sports, news, weather, 
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and the like. These categories may further be subdivid- 
ed. Thus movies may be subdivided into categories 
such as comedies, drama, science fiction and so forth. 
A semantic representation of the electronic program 
guide contents is stored at 46, based on the same goal- 
oriented grammar structure used by the natural lan- 
guage parser. This allows the parser to readily find in- 
formation about what is available lor viewing. If the user 
has asked for comedy movies, the comedy movie por- 
tion of the semantic representation is accessed by the 
parser, and the available programs falling under this cat- 
egory may then be displayed to the user as will be more 
fully described below. 

[0024] In some instances the natural language parser 
will immediately identify a program the user is interested 
in watching. In other instances, there may be multiple 
choices, or no choices. To accommodate these many 
possibilities, the system includes a dialog manager 54. 
The dialog manager interfaces with the natural lan- 
guage parser 42, and generates interactive prompts for 
synthesized speech or on-screen presentation to the us- 
er. These prompts are designed to elicit further informa- 
tion from the user, to help the natural language parser 
find program offerings the user may be interested in. 
The dialog manager has a user profile data store 56, 
which stores information about the user's previous in- 
formation selections, and also information about how 
the user likes to have the information displayed. This 
data store thus helps the dialog manager tune its 
prompts to best suit the users expectations. 
[0025] The presently preferred natural language pars- 
er will now be described. Figure 3 depicts components 
of the natural language parser 42 in more detail. In par- 
ticular, speech understanding module 128 includes a lo- 
cal parser 160 to identify predetermined relevant task- 
related fragments. Speech understanding module 128 
also includes a global parser 162 to extract the overall 
semantics of the speaker's request. 
[0026] The local parser 160 utilizes in the preferred 
embodiment small and multiple grammars along with 
several passes and a unique scoring mechanism to pro- 
vide parse hypotheses. For example, the novel local 
parser 102 recognizes according to this approach 
phrases such as dates, names of people, and movie cat- 
egories. If a speaker utters "record me a comedy in 
which Mel Brooks stars and is shown before January 
23rd", the local parser recognizes: "comedy" as being a 
movie category; "January 23rd" as a date; and "Mel 
Brooks" as an actor. The global parser assembles those 
items (movie category, date, etc.) together and recog- 
nizes that the speaker wishes to record a movie with 
certain constraints. 

[0027] Speech understanding module 128 includes 
knowledge database 163 which encodes the semantics 
of a domain (i.e., goal to be achieved). In this sense, 
knowledge database 1 63 is preferably a domain-specif- 
ic database as depicted by reference numeral 165 and 
is used by dialog manager 130 to determine whether a 



particular action related to achieving a predetermined 
goal is possible. 

[0028] The preferred embodiment encodes the se- 
mantics via a frame data structure 164. The frame data 

5 structure 164 contains empty slots 166 which are filled 
when the semantic interpretation of global parser 162 
matches the frame. For example, a frame data structure 
(whose domain is tuner commands) includes an empty 
slot for specifying the viewer-requested channel for a 

10 time period. If viewer 1 20 has provided the channel, then 
that empty slot is filled with that information. However, 
if that particular frame needs to be filled after the viewer 
has initially provided its request, then dialog manager 
130 instructs computer response module 134 to ask 

15 viewer 120 to provide a desired channel. 

[0029] The frame data structure 164 preferably in- 
cludes multiple frames which each in turn have multiple 
slots. One frame may have slots directed to attributes 
of a movie, director, and type of movie. Another frame 

20 may have slots directed to attributes associated with the 
time in which the movie is playing, the channel, and so 
forth. 

[0030] The following reference discusses global pars- 
ers and frames: R. Kuhn and R. D. Mori, Spoken Dia- 

25 fogues with Computers (Chapter 14: Sentence interpre- 
tation), Academic Press, Boston (1998). 
[0031] Dialog manager 130 uses dialog history data 
file 167 to assist in filling in empty slots before asking 
the speaker for the information. Dialog history data file 

30 167 contains a log of the conversation which has oc- 
curred through the device of the present invention. For 
example, if a speaker utters "I'd like to watch another 
Marilyn Monroe movie, 0 the dialog manager 130 exam- 
ines the dialog history data file 1 67 to check what movies 

35 the user has already viewed or rejected in a previous 
dialog exchange. If the speaker had previously rejected 
"Some Like It Hot", then the dialog manager 130 fills the 
empty slot of the movie title with movies of a different 
title. If a sufficient number of slots have been filled, then 

40 the present invention will ask the speaker to verify and 
confirm the program selection. Thus, if any assumptions 
made by the dialog manager 130 through the use of di- 
alog history data file 167 prove to be incorrect, then the 
speaker can correct the assumption. 

45 [0032] The natural language parser 42 analyzes and 
extracts semantically important and meaningful topics 
from a loosely structured, natural language text which 
may have been generated as the output of an automatic 
speech recognition system (ASR) used by a dialogue or 

50 speech understanding system. The natural language 
parser 42 translates the natural language text input to a 
new representation by generating well-structured tags 
containing topic information and data, and associating 
each tag with the segments of the input text containing 

55 the tagged information. In addition, tags may be gener- 
ated in other forms such as a separate list, or as a se- 
mantic frame. 

[0033] Robustness is a feature of the natural lan- 
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guage parser 42 as the input can contain grammatically 
incorrect English sentences, due to the following rea- 
sons: the input to the recognizer is casual, dialog style, 
natural speech can contain broken sentences, partial 
phrases, and the insertion, omission, or misrecognition 
of errors by the speech recognizer even when the 
speech input is considered correct. The natural lan- 
guage parser 42 deals robustly with all types of input 
and extracts as much information as possible. 
[0034] Figure 4 depicts the different components of 
the local parser 160 of the natural language parser 42. 
The natural language parser 42 preferably utilizes gen- 
eralized parsing techniques in a multi-pass approach as 
a fixed-point computation. Each topic is described as a 
context-sensitive LR (left-right and rightmost derivation) 
grammar, allowing ambiguities. The following are refer- 
ences related to context-sensitive LR grammars: A. Aho 
and J. D. Ullman, Principles of Compiler Design, Addi- 
son Wesley Publishing Co., Reading, Massachusetts 
(1977); and N. Tomita, Generalized LR Parsing, Kluwer 
Academic Publishers, Boston, Massachusetts (1991). 
[0035] At each pass of the computation, a generalized 
parsing algorithm is used to generate preferably all pos- 
sible (both complete and partial) parse trees independ- 
ently for each targeted topic. Each pass potentially gen- 
erates several alternative parse-trees, each parse-tree 
representing a possibly different interpretation of a par- 
ticular topic. The multiple passes through preferably 
parallel and independent paths result in a substantial 
elimination of ambiguities and overlap among different 
topics. The generalized parsing algorithm is a system- 
atic way of scoring all possible parse-trees so that the 
(N) best candidates are selected utilizing the contextual 
information present in the system. 
[0036] Local parsing system 1 60 is carried out in th ree 
stages: lexical analysis 220; parallel parse-forest gen- 
eration for each topic (for example, generators 230 and 
232); and analysis and synthesis of parsed components 
as shown generally by reference numeral 234. 

Lexical analysis : 

[0037] A speaker utters a phrase that is recognized 
by an automatic speech recognizer 217 which gener- 
ates input sentence 218. Lexical analysis stage 220 
identifies and generates tags for the topics (which do 
not require extensive grammars) in input sentence 218 
using lexical filters 226 and 228. These include, for ex- 
ample, movie names; category of movie; producers; 
names of actors and actresses; and the like. A regular- 
expression scan of the input sentence 218 using the 
keywords involved in the mentioned exemplary tags is 
typically sufficient at this level. Also, performed at this 
stage is the tagging of words in the input sentence that 
are not part of the lexicon of particular grammar. These 
words are indicated using an X-tag so that such noise 
words are replaced with the letter °X B . 



Parallel parse-forest generation: 

[0038] The parser 42 uses a high-level general pars- 
ing strategy to describe and parse each topic separately, 
s and generates tags and maps them to the input stream. 
Due to the nature of unstructured input text 218, each 
individual topic parser preferably accepts as large a lan- 
guage as possible, ignoring all but important words, 
dealing with insertion and deletion errors. The parsing 
10 of each topic involves designing context-sensitive gram- 
mar rules using a meta-level specification language, 
much like the ones used in LR parsing. Examples of 
grammars include grammar A 240 and grammar B 242. 
Using the present invention's approach, topic grammars 
is 240 and 242 are described as if they were an LR-type 
grammar, containing redundancies and without elimi- 
nating shift and reduce conflicts. The result of parsing 
an input sentence is all possible parses based on the 
grammar specifications. 
20 [0039] Generators 230 and 232 generate parse for- 
ests 250 and 252 for their topics. Tag-generation is done 
by synthesizing actual information found in the parse 
tree obtained during parsing. Tag generation is accom- 
plished via tag and score generators 260 and 262 which 
25 respectively generate tags 264 and 266. Each identified 
tag also carries information about what set of input 
words in the input sentence are covered by the tag. Sub- 
sequently the tag replaces its cover-set. In the preferred 
embodiment, context information 267 is utilized for tag 
30 and score generations, such as by generators 260 and 
262. Context information 267 is utilized in the scoring 
heuristics for adjusting weights associated with a heu- 
ristic scoring factor technique that is discussed below. 
Context information 267 preferably includes word con- 
35 fidence vector 268 and dialogue context weights 269. 
However, it should be understood that the parser 42 is 
not limited to using both word confidence vector 268 and 
dialogue context weights 269, but also includes using 
one to the exclusion of the other, as well as not utilizing 
40 context information 267. 

[0040] Automatic speech recognition process block 
217 generates word confidence vector 268 which indi- 
cates how well the words in input sentence 218 were 
recognized. Dialog manager 130 generates dialogue 
4S context weights 269 by determining the state of the di- 
alogue. For example, dialog manager 130 asks a user 
about a particular topic, such as, what viewing time is 
preferable. Due to this request, dialog manager 130 de- 
termines that the state of the dialogue is timeoriented. 
so Dialog manager 1 30 provides dialogue context weights 
269 in order to inform the proper processes to more 
heavily weight the detected time-oriented words. 

Synthesis of Tag-components: 

55 

[0041] The topic spotting parser of the previous stage 
generates a significant amount of information that needs 
to be analyzed and combined together to form the final 
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output of the local parser. The parser 42 is preferably as 
"aggressive" as possible in spotting each topic resulting 
in the generation of multiple tag candidates. Additionally 
in the presence of numbers or certain key-words, such 
as "between", "before", "and", "or", "around", etc., and 
especially if these words have been introduced or 
dropped due to recognition errors it is possible to con- 
struct many alternative tag candidates. For example, an 
input sentence could have insertion or deletion errors. 
The combining phase determines which tags form a 
more meaningful interpretation of the input. The parser 
42 defines heuristics and makes a selection based on 
them using a N-Best candidate selection process. Each 
generated tag corresponds to a set of words in the input 
word string, called the tag's cover-set. 
[0042] A heuristic is used that takes into account the 
cover-sets of the tags used to generate a score. The 
score roughly depends on the size of the cover-set, the 
sizes in the number of the words of the gaps within the 
covered items, and the weights assigned to the pres- 
ence of certain keywords. In the preferred embodiment, 
ASR-derived confidence vector and dialog context in- 
formation are utilized to assign priorities to the tags. For 
example applying channel-tags parsing first potentially 
removes channel-related numbers that are easier to 
identify uniquely from the input stream, and leaves fewer 
numbers to create ambiguities with other tags. Prefera- 
bly, dialog context information is used to adjust the pri- 
orities. 

N-Best Candidates Selection 

[0043] At the end of each pass, an N-best processor 
270 selects the N-best candidates based upon the 
scores associated with the tags and generates the topic- 
tags, each representing the information found in the cor- 
responding parse-tree. Once topics have been discov- 
ered this way, the corresponding words in the input can 
be substituted with the tag information. This substitution 
transformation eliminates the corresponding words from 
the current input text. The output 280 of each pass is 
fed-back to the next pass as the new input, since the 
substitutions may help in the elimination of certain am- 
biguities among competing grammars or help generate 
better parse-trees by filtering out overlapping symbols. 
[0044] Computation ceases when no additional tags 
are generated in the last pass. The output of the final 
pass becomes the output of the local parser to global 
parser 162. Since each phase can only reduce the 
number of words in its input and the length of the input 
text is finite, the number of passes in the fixed-point 
computation is linearly bounded by the size of its input. 
[0045] The following scoring factors are used to rank 
the alternative parse trees based on the following at- 
tributes of a parse-tree: 

♦ Number of terminal symbols. 

• Number of non-terminal symbols. 



• The depth of the parse-tree. 

• The size of the gaps in the terminal symbols. 

• ASR-Confidence measures associated with each 
terminal symbol. 

5 • Context-adjustable weights associated with each 
terminal and non-terminal symbol. 

[0046] Each path preferably corresponds to a sepa- 
rate topic that can be developed independently, operat- 
ic ing on a small amount of data, in a computationally in- 
expensive way. The architecture of the parser 42 is flex- 
ible and modular so incorporating additional paths and 
grammars, for new topics, or changing heuristics for par- 
ticular topics is straight forward, this also allows devel- 
15 oping reusable components that can be shared among 
different systems easily. 

[0047] From the foregoing it will be appreciated that 
the remote control system of the invention offers a great 
deal of user-friendly functionality not currently found in 

20 any electronic program guide control system or remote 
control system. While the invention has been described 
in its presently preferred embodiment, it will be under- 
stood that the invention is capable of modification with- 
out departing from the spirit of the invention as set forth 

25 in the appended claims. 



Claims 

30 1. A remote control system for controlling at least one 
audio/video component comprising: 

a handheld case; 

a microphone disposed in said case for receiv- 
es ing speech input from a user; 

a communication system disposed in said case 
for transmitting data signals to a location re- 
mote from said handheld case; 
a speech recognizer for processing said 
40 speech input; 

a memory for storing a representation of an 
electronic program guide; 
a natural language parser in communication 
with said speech recognizer and with said 
45 memory, said parser being operative to extract 

semantic content from said processed speech 
input and to access said electronic program 
guide using said extracted semantic content to 
generate control instructions for said audio/vid- 
so eo component. 

2. The remote control system of claim 1 wherein said 
speech recognizer is disposed within said handheld 
case. 

55 

3. The remote control system of claim 1 further com- 
prising a processor component remote from said 
handheld case and wherein said speech recognizer 
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is disposed in said processor component. 

4. The remote control system of claim 1 wherein said 
natural language parser is disposed within said 
handheld case. 5 

5. The remote control system of claim 1 further com- 
prising a processor component remote from said 
handheld case and wherein said natural language 
parser is disposed in said processor component. 10 

6. The remote control system of claim 1 further com- 
prising electronic program guide acquisition system 
coupled to said memory for downloading said rep- 
resentation of an electronic program guide via a tel- is 
ecommunications link. 



7. The remote control system of claim 6 wherein said 
telecommunications link is the internet. 

8. The remote control system of claim 6 wherein said 
telecommunications link is an audio/video program 
content delivery system. 

9. The remote control system of claim 1 wherein said 
audio/video component includes a tuner and 
wherein said remote control system communicates 
with said tuner to acquire said representation of an 
electronic program guide. 

10. The remote control system of claim 1 wherein said 
natural language parser is a task-based parser em- 
ploying a grammar comprising a plurality of frames 
having slots representing the semantic structure of 
said electronic program guide. 

11. The remote control system of claim 1 further com- 
prising a dialog manager in communication with 
said parser for generating prompts to the user 
based on said extracted semantic content. 

12. The remote control system of claim 1 further com- 
prising dialog manager having speech synthesizer 
for generating speech prompts to the user based on 
said extracted semantic content. 

13. The remote control system of claim 1 further com- 
prising digitizing tablet disposed in said handheld 
case for pen-based input of user-supplied informa- 
tion. 

1 4. The remote control system of claim 1 3 wherein said 
digitizing tablet displays prompts that are actuable 
by pen to limit the context in which said parser ex- 
tracts semantic content. 

15. The remote control system of claim 1 further com- 
prising display unit disposed in said handheld case 



for providing information to the user. 

16. A remote control device comprising: 

a handheld case having a communication inter- 
face through which control instructions are is- 
sued to a remote component; 
a display screen disposed in said case; 
a microphone disposed in said case; 
a speech recognizer system coupled to said mi- 
crophone; 

a dialog manager coupled to said speech rec- 
ognizer system and to said display screen for 
issuing control commands through said com- 
munication interface and for displaying infor- 
mation on said display screen. 

17. The remote control device of claim 1 6 wherein said 
speech recognizer system includes a natural lan- 

20 guage parser for extracting semantic information 
from speech information input through said micro- 
phone. 

18. The remote control device of claim 16 wherein said 
25 speech recognizer system includes a natural lan- 
guage parser having associated data store contain- 
ing a representation of an electronic program guide, 
and wherein said parser selectively extracts infor- 
mation from said program guide based on speech 

30 information input through said microphone. 

19. The remote control device of claim 16 wherein said 
speech recognizer system includes a data store 
containing a representation of an electronic pro- 

35 gram guide and a system for selectively updating 
the contents of said data store. 

20. The remote control device of claim 19 wherein said 
system for selectively updating the contents of said 

40 data store includes a tuner for accessing a source 
of electronic program guide information. 

21 . The remote control device of claim 1 9 wherein said 
system for selectively updating the contents of said 

45 data store includes an internet access system for 
accessing a source of electronic program guide in- 
formation. 

22. The remote control device of claim 1 6 wherein said 
so speech recognizer has a first component disposed 

in said handheld case and a second component dis- 
posed outside said handheld case. 

23. The remote control device of claim 22 wherein said 
55 first component generates an audio domain signal 

for transmission to said second component. 

24. The remote control device of claim 22 wherein said 
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first component extracts speech parameters from 
input speech from a user and transmits said param- 
eters to said second component for recognition. 

25. The remote control device of claim 1 wherein said s 
speech recognizer has a first component disposed 

in said handheld case and a second component dis- 
posed outside said handheld case. 

26. The remote control device of claim 25 wherein said io 
first component generates an audio domain signal 

for transmission to said second component. 

27. The remote control device of claim 25 wherein said 
first component extracts speech parameters from is 
input speech from a user and transmits said param- 
eters to said second component tor recognition. 



20 



25 



30 



35 



40 



45 



SO 



55 



<EP 107937 1A1 I > 



8 



EP 1 079 371 A1 




FIG. 1 



Input 
Speech 

56 




User 
Profile 
Data 



X 



Synthesized 
Speech 

cx — 



On-Screen 
Display 




Remote Display 



oao 

OQO 

ooo 



Natural 
Language 
Parser 



Dialog 
Manager 



Grammars 



irser I 



Semantic 
Representation 

of Electronic 
Program Guide ' 




-54 



52 



Tuner 



Recorder 



FIG. 2 



BNSDOCID: <EP 1079371A1_I_> 



9 



EP 1 079 371 A1 



FIG. 3 



165 



160 




164 



Knowledge 
Database 




166 



Speech 
Understanding 
(English to semantic 
components) 




Translation 
(semantic 
components to 
English) 



Dialog 
History 



140 



167 



BNSDOCID: <EP 1079371 A1_I_> 



10 



EP 1 079 371 A1 



160 



267 




Automatic Speech Recognition 
(ASR) 



217 



Input Sentence 



218 



220 



Lexical 
Analysis 



Word 

Confidence 
Vector 



Context Info 



Dialogue 

Context 

Weights 




Parse Forest 
Generator 



230 



A 



Tag & Score Generator for A 

i i v.. 



260 



Tag -II 



Tag-N 





£ 



Lexical Filter 



Parse Forest 
Generator 



228 



232 



* 



A & A 

(I) (II) ... (M) 



262 



Tag <& Scan? Generator for B. 



Tag -I Tag -II Tag -M 




N-Best Tag Selection 
& combining 



270 



Tagged Text output 



280 



Global Parser 



234 



Dialogue Manager 



162 



130 



FIG. 4 



1078371 A1 I > 



11 



EP 1 079 371 A1 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 00 30 6975 



DOCUMENTS CONSIDERED TO BE RELEVANT 




Category 


Citation ot document with indication, where appropriate, 
ot relevant passages 


Relevant 
to claim 


CLASSIFICATION OF THE 
APPLICATION (InLCI.7) 


£ 


EP 1 037 463 A (MATSUSHITA ELECTRIC IND CO 
LTD) 20 September 2000 (2000-09-20) 
* page 3 f column 4, line 28 - page 7, 
column 12, line 12; claims 1-20; figures 
1-3 * 


1,3,5-12 


G10L15/26 
H04N5/445 


E 


EP 1 033 701 A (MATSUSHITA ELECTRIC IND CO 
LTD) 6 September 2000 (2000-09-06) 
* page 3, column 3, line 41 - page 5, 
column 8, line 55; claims 1-20; figures 
1.2 * 


1,3,5-12 




X 


US 5 774 859 A (H0USER PETER B ET AL) 
30 June 1998 (1998-06-30) 

* column 15, line 19 - column 16, line 50; 
figures 4-6 * 

* column 23, line 38 - line 50 * 

* column 30, line 19 - column 31, line 20 


1,3,5,6, 
8,9,11 




Y 


* 


2,7,13, 








15-27 


TECHNICAL FIELDS 
SEARCHED (lnt.Ci-7) 


Y 


DE 40 29 697 A (PIONEER ELECTRONIC CORP) 
4 July 1991 (1991-07-04) 
* column 5, line 62 - column 6, line 41; 
figure 3 * 


2 


G10L 
H04N 


Y 


WO 97 48230 A (STARSIGHT TELECAST INC) 
18 December 1997 (1997-12-18) 
* page 5, line 9 - line 37 * 


7,21 




Y 


EP 0 838 945 A (MATSUSHITA ELECTRIC IND CO 
LTD) 29 April 1998 (1998-04-29) 

* abstract; figure 1 * 

* page 7, line 18 - line 37; claims 1,2 * 

-/-- 


13 




The present search report has been drawn up for all claims 







Mac© ot search 

THE HAGUE 



Date si compieicn cl tf»e search 

4 December 2000 



Wanzeele, R 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken aJone 

Y . particularly relevant if combined with another 

document ot ihe same category 
A : technological background 
O : non-written disclosure 
P . intern ictfiale document 



T : theory or principte umtenytng the invention 
E : earlier patent document, but published on. or 

after the filing date 
D : document cited in the application 
L : document cited for other reasons 

a : member of the same patent family, corresponding 
document 



1079371A1J _> 



12 



EP 1 079 371 A1 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 00 30 6975 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION (lrrt.CI.7) 



WO 98 16062 A (CHANG ALLEN) 
16 April 1998 (1998-04-16) 

* abstract; figures 1,2 * 

* claims 1-5 * 

US 5 878 385 A (BRALICH PHILLIP A ET AL) 
2 March 1999 (1999-03-02) 
abstract * 

* column 6, line 31 - column 7 ? line 9; 
figure 1 * 



15-27 



TECHNICAL FIELDS 
SEARCHED 0nLC1.7) 



The present search report has been drawn up for all claims 



a*ae .ii search 


Date o1 completion o the search 


Evamrtfir 


THE HAGUE 


4 December 2000 


Wanzeele, R 



CA1 EGORY OF CITED DOCUMENTS 

X : partoitartv relevant it iaken alone 

Y : particularly relevant it combined with another 

document of the same category 
A : technological background 
O : non-wrnten disclosure 
P : intermedials document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on. or 

after the filing dale 
0 : document cited in the a p plica lion 
L : document cflcd for other reasons 

& . member of the same patent family, corresponding 
document 



BNSDOCIO <EP 1079371A1_1_> 



13 



EP 1 079 371 A1 



ANNEX TO THE EUROPEAN SEARCH REPORT 

ON EUROPEAN PATENT APPLICATION NO. EP 00 30 6975 



This annex lists the patent family members relating to the patent documents cited in the above-mentioned European search report. 
The members are as contained in the European Patent Office ED=» file on 

The European Patent Office is in no way liable for these particulars which are merely given for me purpose of Information. 

04-12-2000 



Patent Hrvi imont 




Publication 




Patent family 




Publication 


cited in search report 




date 




member(s) 




date 


EP 1037463 


A 


20-09-2000 


NONE 








EP 1033701 


A 


06-09-2000 


JP 


2000250575 


A 


14-09-2000 


US 5774859 


A 


30-06-1998 


AU 


4748896 


A 


31-07-1996 








WO 


9621990 


A 


18-07-1996 


DE 4029697 


A 


04-07-1991 


JP 


3202900 


A 


04-09-1991 








JP 


3203796 


A 


05-09-1991 








JP 


3203795 


A 


05-09-1991 








JP 


3203797 


A 


05-09-1991 








US 


5267323 


A 


30-11-1993 


WO 9748230 


A 


18-12-1997 


AU 


3294997 


A 


07-01-1998 








US 


6133909 


A 


17-10-2000 


EP 0838945 


A 


29-04-1998 


US 


5889506 


A 


30-03-1999 








JP 


10191468 


A 


21-07-1998 


W0 9816062 


A 


16-04-1998 


AU 


4896397 


A 


05-05-1998 








CN 


1237308 


A 


01-12-1999 








EP 


0931415 


A 


28-07-1999 


US 5878385 


A 


02-03-1999 


AU 


4416797 


A 


02-04-1998 








WO 


9811491 


A 


19-03-1998 


>> 
l 
» 



ui For more details about this annex : see Official Journal of the European Patent Office. No. 12/82 



1079371A1 i > 



14 



